Streptococcus spp genomes

Genomes of the Strep genus were downloaded using the following command:

ncbi-genome-download  --parallel 8 --section refseq --assembly-level complete,chromosome --format fasta --genus "Streptococcus" bacteria

All genomes were cleaned for:

In total, 123 genomes of Strep pneu and 452 genomes of non Strep pneu were prepared for kmc analysis.

Workflow

A list of k-mer was prepared for counting: 21,31,33,100-105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195,200,205,210,220,230,235,240,245,250,255

k-mers Spectrum Analysis

The plot can be interactively inspected.

Looking at the plot, interestingly there was a shift pattern of k-mers betweenn Pneu and Non Pneu. The short k-mers (i.e. 21,31,33) were used as reference as they were often used in other k-mer based software for short reads. For the Pneu k-mers, on the X-axis, the frequency of the k-mers depth changed steadly following a smooth curve until 120. There was a flipping over of change of frequency between the 100s-mers and 200s-mers at the depth 67.

For selecting a suitable k-mer, a 100-mer may be a good start with some options: